Privacy-preserving heterogeneous health data sharing

نویسندگان

  • Noman Mohammed
  • Xiaoqian Jiang
  • Rui Chen
  • Benjamin C. M. Fung
  • Lucila Ohno-Machado
چکیده

OBJECTIVE Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among existing privacy models, ε-differential privacy provides one of the strongest privacy guarantees and makes no assumptions about an adversary's background knowledge. All existing solutions that ensure ε-differential privacy handle the problem of disclosing relational and set-valued data in a privacy-preserving manner separately. In this paper, we propose an algorithm that considers both relational and set-valued data in differentially private disclosure of healthcare data. METHODS The proposed approach makes a simple yet fundamental switch in differentially private algorithm design: instead of listing all possible records (ie, a contingency table) for noise addition, records are generalized before noise addition. The algorithm first generalizes the raw data in a probabilistic way, and then adds noise to guarantee ε-differential privacy. RESULTS We showed that the disclosed data could be used effectively to build a decision tree induction classifier. Experimental results demonstrated that the proposed algorithm is scalable and performs better than existing solutions for classification analysis. LIMITATION The resulting utility may degrade when the output domain size is very large, making it potentially inappropriate to generate synthetic data for large health databases. CONCLUSIONS Unlike existing techniques, the proposed algorithm allows the disclosure of health data containing both relational and set-valued data in a differentially private manner, and can retain essential information for discriminative analysis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Framework Design and Case Study for Privacy-Preserving Medical Data Publishing

With the pervasive using of Electronic Medical Records (EMR) and telemedicine technologies, more and more digital healthcare data are accumulated from multiple sources. As healthcare data is valuable for both commercial and scientific research, the demand of sharing healthcare data has been growing rapidly. Nevertheless, health care data normally contains a large amount of personal information,...

متن کامل

A Framework for Privacy-Preserving Medical Document Sharing

Health information systems have greatly increased availability of medical documents and benefited healthcare management and research. However, there are growing concerns about privacy in sharing medical documents. Existing approaches for privacypreserving data sharing deal mostly with structured data. Current privacy techniques for unstructured medical text focus on detection and removal of pat...

متن کامل

Privacy-preserving Sanitization in Data Sharing

PRIVACY-PRESERVING SANITIZATION IN DATA SHARING

متن کامل

Preserving privacy in shared provenance data

Provenance management still lacks robust models for sharing provenance data between multiple parties while keeping parts of it private to the owner. This limits the potential for provenance dissemination, which is a critical step in enabling data sharing amongst partners with limited a priori mutual trust. In turn, this has a negative impact on data-intensive science and its associated research...

متن کامل

ارایه یک روش جدید انتشار داده‌ها با حفظ محرمانگی با هدف بهبود دقّت طبقه‌‌بندی روی داده‌های گمنام

Data collection and storage has been facilitated by the growth in electronic services, and has led to recording vast amounts of personal information in public and private organizations databases. These records often include sensitive personal information (such as income and diseases) and must be covered from others access. But in some cases, mining the data and extraction of knowledge from thes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of the American Medical Informatics Association : JAMIA

دوره 20 3  شماره 

صفحات  -

تاریخ انتشار 2013